Overview

Dataset statistics

Number of variables16
Number of observations84038
Missing cells16885
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.3 MiB
Average record size in memory128.0 B

Variable types

NUM9
CAT7

Warnings

geneSymbol has a high cardinality: 9703 distinct values High cardinality
diseaseId has a high cardinality: 11181 distinct values High cardinality
diseaseName has a high cardinality: 11181 distinct values High cardinality
diseaseClass has a high cardinality: 755 distinct values High cardinality
source has a high cardinality: 51 distinct values High cardinality
DPI is highly correlated with DSIHigh correlation
DSI is highly correlated with DPIHigh correlation
diseaseClass has 3637 (4.3%) missing values Missing
EI has 4383 (5.2%) missing values Missing
YearInitial has 4383 (5.2%) missing values Missing
YearFinal has 4383 (5.2%) missing values Missing
NofSnps is highly skewed (γ1 = 89.28782133) Skewed
NofPmids has 7131 (8.5%) zeros Zeros
NofSnps has 75760 (90.1%) zeros Zeros

Reproduction

Analysis started2020-11-18 22:52:29.404232
Analysis finished2020-11-18 22:53:13.162224
Duration43.76 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

geneId
Real number (ℝ≥0)

Distinct9703
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean719625.0238
Minimum1
Maximum109580095
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:22.954751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile434
Q12662
median5468
Q310457
95-th percentile220972
Maximum109580095
Range109580094
Interquartile range (IQR)7795

Descriptive statistics

Standard deviation8284285.301
Coefficient of variation (CV)11.51194723
Kurtosis142.0155591
Mean719625.0238
Median Absolute Deviation (MAD)3412
Skewness11.99847004
Sum6.047584775e+10
Variance6.862938294e+13
MonotocityIncreasing
2020-11-18T23:53:23.509800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
71243400.4%
 
66482850.3%
 
35692700.3%
 
54432410.3%
 
57432390.3%
 
71572320.3%
 
35532310.3%
 
45241920.2%
 
48431840.2%
 
57281820.2%
 
Other values (9693)8164297.1%
 
ValueCountFrequency (%) 
12< 0.1%
 
226< 0.1%
 
914< 0.1%
 
1039< 0.1%
 
123< 0.1%
 
ValueCountFrequency (%) 
1095800955< 0.1%
 
1073056811< 0.1%
 
1070753101< 0.1%
 
1067834991< 0.1%
 
1064813232< 0.1%
 

geneSymbol
Categorical

HIGH CARDINALITY

Distinct9703
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
TNF
 
340
SOD2
 
285
IL6
 
270
POMC
 
241
PTGS2
 
239
Other values (9698)
82663 
ValueCountFrequency (%) 
TNF3400.4%
 
SOD22850.3%
 
IL62700.3%
 
POMC2410.3%
 
PTGS22390.3%
 
TP532320.3%
 
IL1B2310.3%
 
MTHFR1920.2%
 
NOS21840.2%
 
PTEN1820.2%
 
Other values (9693)8164297.1%
 
2020-11-18T23:53:23.985811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2161 ?
Unique (%)2.6%
2020-11-18T23:53:24.365023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length12
Median length5
Mean length4.83180228
Min length2

DSI
Real number (ℝ≥0)

HIGH CORRELATION

Distinct320
Distinct (%)0.4%
Missing40
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.5388791281
Minimum0.231
Maximum1
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:24.685125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.231
5-th percentile0.324
Q10.445
median0.533
Q30.623
95-th percentile0.78
Maximum1
Range0.769
Interquartile range (IQR)0.178

Descriptive statistics

Standard deviation0.1341011444
Coefficient of variation (CV)0.248851992
Kurtosis0.1746056657
Mean0.5388791281
Median Absolute Deviation (MAD)0.088
Skewness0.3534637234
Sum45264.769
Variance0.01798311694
MonotocityNot monotonic
2020-11-18T23:53:25.038064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.4457560.9%
 
0.5297270.9%
 
0.3796460.8%
 
0.8216110.7%
 
0.6285970.7%
 
0.6595830.7%
 
0.6745780.7%
 
0.75750.7%
 
0.6195740.7%
 
0.6535730.7%
 
Other values (310)7777892.6%
 
ValueCountFrequency (%) 
0.2313400.4%
 
0.2362320.3%
 
0.2482700.3%
 
0.2661670.2%
 
0.2762310.3%
 
ValueCountFrequency (%) 
12840.3%
 
0.9313760.4%
 
0.894300.5%
 
0.8614580.5%
 
0.8395500.7%
 

DPI
Real number (ℝ≥0)

HIGH CORRELATION

Distinct25
Distinct (%)< 0.1%
Missing59
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean0.6934339299
Minimum0.038
Maximum0.962
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:25.392949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.038
5-th percentile0.231
Q10.577
median0.769
Q30.846
95-th percentile0.923
Maximum0.962
Range0.924
Interquartile range (IQR)0.269

Descriptive statistics

Standard deviation0.2123399344
Coefficient of variation (CV)0.3062150917
Kurtosis0.3762738081
Mean0.6934339299
Median Absolute Deviation (MAD)0.116
Skewness-1.05059721
Sum58233.888
Variance0.04508824776
MonotocityNot monotonic
2020-11-18T23:53:25.692087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%) 
0.808964311.5%
 
0.885908710.8%
 
0.846874810.4%
 
0.76968628.2%
 
0.73158236.9%
 
0.92356756.8%
 
0.69249215.9%
 
0.65439824.7%
 
0.96234104.1%
 
0.61533784.0%
 
Other values (15)2245026.7%
 
ValueCountFrequency (%) 
0.0381390.2%
 
0.0777330.9%
 
0.1157900.9%
 
0.1548001.0%
 
0.19210081.2%
 
ValueCountFrequency (%) 
0.96234104.1%
 
0.92356756.8%
 
0.885908710.8%
 
0.846874810.4%
 
0.808964311.5%
 

diseaseId
Categorical

HIGH CARDINALITY

Distinct11181
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
C0006142
 
1074
C0036341
 
883
C0023893
 
774
C0009402
 
702
C0033578
 
616
Other values (11176)
79989 
ValueCountFrequency (%) 
C000614210741.3%
 
C00363418831.1%
 
C00238937740.9%
 
C00094027020.8%
 
C00335786160.7%
 
C03763586160.7%
 
C06782225380.6%
 
C14581555270.6%
 
C47048745250.6%
 
C12579315250.6%
 
Other values (11171)7725891.9%
 
2020-11-18T23:53:26.102023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique6427 ?
Unique (%)7.6%
2020-11-18T23:53:26.456106image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

diseaseName
Categorical

HIGH CARDINALITY

Distinct11181
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
Malignant neoplasm of breast
 
1074
Schizophrenia
 
883
Liver Cirrhosis, Experimental
 
774
Colorectal Carcinoma
 
702
Prostatic Neoplasms
 
616
Other values (11176)
79989 
ValueCountFrequency (%) 
Malignant neoplasm of breast10741.3%
 
Schizophrenia8831.1%
 
Liver Cirrhosis, Experimental7740.9%
 
Colorectal Carcinoma7020.8%
 
Prostatic Neoplasms6160.7%
 
Malignant neoplasm of prostate6160.7%
 
Breast Carcinoma5380.6%
 
Mammary Neoplasms5270.6%
 
Mammary Neoplasms, Human5250.6%
 
Mammary Carcinoma, Human5250.6%
 
Other values (11171)7725891.9%
 
2020-11-18T23:53:26.848950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique6427 ?
Unique (%)7.6%
2020-11-18T23:53:27.277278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length177
Median length23
Mean length24.18607059
Min length4

diseaseType
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
disease
60478 
phenotype
13653 
group
9907 
ValueCountFrequency (%) 
disease6047872.0%
 
phenotype1365316.2%
 
group990711.8%
 
2020-11-18T23:53:27.664000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-18T23:53:27.885026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:28.154115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.089150146
Min length5

diseaseClass
Categorical

HIGH CARDINALITY
MISSING

Distinct755
Distinct (%)0.9%
Missing3637
Missing (%)4.3%
Memory size656.5 KiB
C04
6576 
C23;C10
 
5434
C06;C04
 
4201
F03
 
4177
C04;C17
 
3546
Other values (750)
56467 
ValueCountFrequency (%) 
C0465767.8%
 
C23;C1054346.5%
 
C06;C0442015.0%
 
F0341775.0%
 
C04;C1735464.2%
 
C25;F0326233.1%
 
C1425463.0%
 
C1024642.9%
 
C06;C2524452.9%
 
C2322692.7%
 
Other values (745)4412052.5%
 
(Missing)36374.3%
 
2020-11-18T23:53:28.543252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique223 ?
Unique (%)0.3%
2020-11-18T23:53:28.941944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length43
Median length7
Mean length6.534210714
Min length3
Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
Disease or Syndrome
36204 
Neoplastic Process
22380 
Mental or Behavioral Dysfunction
9078 
Pathologic Function
 
3383
Sign or Symptom
 
3035
Other values (23)
9958 
ValueCountFrequency (%) 
Disease or Syndrome3620443.1%
 
Neoplastic Process2238026.6%
 
Mental or Behavioral Dysfunction907810.8%
 
Pathologic Function33834.0%
 
Sign or Symptom30353.6%
 
Finding28333.4%
 
Congenital Abnormality27913.3%
 
Experimental Model of Disease13091.6%
 
Injury or Poisoning11981.4%
 
Neoplastic Process; Experimental Model of Disease6940.8%
 
Other values (18)11331.3%
 
2020-11-18T23:53:29.480226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-11-18T23:53:29.836496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length51
Median length19
Mean length20.20069492
Min length7

score
Real number (ℝ≥0)

Distinct70
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3570343178
Minimum0.3
Maximum1
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:30.148132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.3
5-th percentile0.3
Q10.3
median0.3
Q30.34
95-th percentile0.7
Maximum1
Range0.7
Interquartile range (IQR)0.04

Descriptive statistics

Standard deviation0.1231498229
Coefficient of variation (CV)0.3449243301
Kurtosis7.601567973
Mean0.3570343178
Median Absolute Deviation (MAD)0
Skewness2.744105646
Sum30004.45
Variance0.01516587888
MonotocityNot monotonic
2020-11-18T23:53:30.497339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.35166361.5%
 
0.3162407.4%
 
0.456356.7%
 
0.3228023.3%
 
0.520512.4%
 
0.718762.2%
 
0.618002.1%
 
0.3317662.1%
 
0.3411261.3%
 
0.358081.0%
 
Other values (60)82719.8%
 
ValueCountFrequency (%) 
0.35166361.5%
 
0.3162407.4%
 
0.3228023.3%
 
0.3317662.1%
 
0.3411261.3%
 
ValueCountFrequency (%) 
13800.5%
 
0.9910< 0.1%
 
0.9815< 0.1%
 
0.9715< 0.1%
 
0.9618< 0.1%
 

EI
Real number (ℝ≥0)

MISSING

Distinct187
Distinct (%)0.2%
Missing4383
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean0.9915316176
Minimum0
Maximum1
Zeros45
Zeros (%)0.1%
Memory size656.5 KiB
2020-11-18T23:53:30.890441image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.967
Q11
median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.05104972624
Coefficient of variation (CV)0.05148572706
Kurtosis130.6567067
Mean0.9915316176
Median Absolute Deviation (MAD)0
Skewness-9.996256584
Sum78980.451
Variance0.002606074549
MonotocityNot monotonic
2020-11-18T23:53:31.277367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
17427988.4%
 
0.52790.3%
 
0.82590.3%
 
0.6672200.3%
 
0.752160.3%
 
0.8571480.2%
 
0.8331350.2%
 
0.8751210.1%
 
0.9091150.1%
 
0.91070.1%
 
Other values (177)37764.5%
 
(Missing)43835.2%
 
ValueCountFrequency (%) 
0450.1%
 
0.22< 0.1%
 
0.255< 0.1%
 
0.33321< 0.1%
 
0.411< 0.1%
 
ValueCountFrequency (%) 
17427988.4%
 
0.9973< 0.1%
 
0.9962< 0.1%
 
0.9955< 0.1%
 
0.99421< 0.1%
 

YearInitial
Real number (ℝ≥0)

MISSING

Distinct73
Distinct (%)0.1%
Missing4383
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean2007.110878
Minimum1924
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:31.646035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1924
5-th percentile1993
Q12003
median2008
Q32013
95-th percentile2018
Maximum2020
Range96
Interquartile range (IQR)10

Descriptive statistics

Standard deviation7.905874962
Coefficient of variation (CV)0.003938932845
Kurtosis2.99199418
Mean2007.110878
Median Absolute Deviation (MAD)5
Skewness-1.265467877
Sum159876417
Variance62.50285892
MonotocityNot monotonic
2020-11-18T23:53:32.016601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
201157936.9%
 
201056436.7%
 
200845725.4%
 
200945605.4%
 
200639394.7%
 
201438974.6%
 
200738664.6%
 
200536314.3%
 
201234574.1%
 
201333884.0%
 
Other values (63)3690943.9%
 
(Missing)43835.2%
 
ValueCountFrequency (%) 
19241< 0.1%
 
19402< 0.1%
 
19441< 0.1%
 
19471< 0.1%
 
19511< 0.1%
 
ValueCountFrequency (%) 
2020500.1%
 
201911331.3%
 
201829573.5%
 
201728993.4%
 
201625283.0%
 

YearFinal
Real number (ℝ≥0)

MISSING

Distinct56
Distinct (%)0.1%
Missing4383
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean2011.928215
Minimum1962
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size656.5 KiB
2020-11-18T23:53:32.393959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1962
5-th percentile2001
Q12008
median2013
Q32017
95-th percentile2019
Maximum2020
Range58
Interquartile range (IQR)9

Descriptive statistics

Standard deviation6.496091576
Coefficient of variation (CV)0.003228788943
Kurtosis3.364900169
Mean2011.928215
Median Absolute Deviation (MAD)5
Skewness-1.330282612
Sum160260142
Variance42.19920576
MonotocityNot monotonic
2020-11-18T23:53:32.734938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2019934811.1%
 
201871328.5%
 
201757176.8%
 
201153716.4%
 
201049855.9%
 
201444185.3%
 
201542215.0%
 
201640744.8%
 
200938774.6%
 
200836904.4%
 
Other values (46)2682231.9%
 
(Missing)43835.2%
 
ValueCountFrequency (%) 
19624< 0.1%
 
19661< 0.1%
 
19671< 0.1%
 
19682< 0.1%
 
19693< 0.1%
 
ValueCountFrequency (%) 
202028073.3%
 
2019934811.1%
 
201871328.5%
 
201757176.8%
 
201640744.8%
 

NofPmids
Real number (ℝ≥0)

ZEROS

Distinct69
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.488017325
Minimum0
Maximum136
Zeros7131
Zeros (%)8.5%
Memory size656.5 KiB
2020-11-18T23:53:33.111443image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile4
Maximum136
Range136
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.430120834
Coefficient of variation (CV)1.633126706
Kurtosis339.3925622
Mean1.488017325
Median Absolute Deviation (MAD)0
Skewness13.29523115
Sum125050
Variance5.905487267
MonotocityNot monotonic
2020-11-18T23:53:33.485893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16127372.9%
 
276389.1%
 
071318.5%
 
327653.3%
 
415431.8%
 
511661.4%
 
66500.8%
 
73580.4%
 
82940.3%
 
92090.2%
 
Other values (59)10111.2%
 
ValueCountFrequency (%) 
071318.5%
 
16127372.9%
 
276389.1%
 
327653.3%
 
415431.8%
 
ValueCountFrequency (%) 
1361< 0.1%
 
912< 0.1%
 
841< 0.1%
 
771< 0.1%
 
742< 0.1%
 

NofSnps
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct189
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.073323972
Minimum0
Maximum2632
Zeros75760
Zeros (%)90.1%
Memory size656.5 KiB
2020-11-18T23:53:33.857492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum2632
Range2632
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17.06011828
Coefficient of variation (CV)15.89465876
Kurtosis11343.82071
Mean1.073323972
Median Absolute Deviation (MAD)0
Skewness89.28782133
Sum90200
Variance291.0476359
MonotocityNot monotonic
2020-11-18T23:53:34.391084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
07576090.1%
 
125803.1%
 
211931.4%
 
37960.9%
 
45690.7%
 
54060.5%
 
63580.4%
 
72730.3%
 
82050.2%
 
92030.2%
 
Other values (179)16952.0%
 
ValueCountFrequency (%) 
07576090.1%
 
125803.1%
 
211931.4%
 
37960.9%
 
45690.7%
 
ValueCountFrequency (%) 
26321< 0.1%
 
22581< 0.1%
 
12521< 0.1%
 
11601< 0.1%
 
9911< 0.1%
 

source
Categorical

HIGH CARDINALITY

Distinct51
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size656.5 KiB
CTD_human
61485 
GENOMICS_ENGLAND
 
5408
PSYGENET
 
3159
ORPHANET
 
3133
UNIPROT
 
1651
Other values (46)
9202 
ValueCountFrequency (%) 
CTD_human6148573.2%
 
GENOMICS_ENGLAND54086.4%
 
PSYGENET31593.8%
 
ORPHANET31333.7%
 
UNIPROT16512.0%
 
CTD_human;GENOMICS_ENGLAND;UNIPROT14111.7%
 
CTD_human;GENOMICS_ENGLAND;ORPHANET;UNIPROT11821.4%
 
CGI10841.3%
 
CTD_human;GENOMICS_ENGLAND9791.2%
 
CLINGEN7340.9%
 
Other values (41)38124.5%
 
2020-11-18T23:53:34.812254image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique5 ?
Unique (%)< 0.1%
2020-11-18T23:53:35.208786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length55
Median length9
Mean length11.07481139
Min length3

Interactions

2020-11-18T23:52:35.803415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:36.174296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:36.570732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:36.931308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:37.325176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:37.688017image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:38.060962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:38.608915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:38.977636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:39.333965image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:39.737198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:40.145951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:40.542186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:40.856069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:41.224437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:41.621550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:42.006853image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:42.414973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:42.774283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:43.152934image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:43.531331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:43.813314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:43.986366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:44.215439image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:44.495441image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:44.742834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:45.052129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:45.419750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:45.901787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:46.134280image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:46.443722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:46.789821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:47.144109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:47.424843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:47.695917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:48.067982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:48.340157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:48.537000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:48.714525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:50.110729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:52.632028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:54.059639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:55.478207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:56.179032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:56.337276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:56.499195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:56.669259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:56.848405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:57.105059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:57.497952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:57.843259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:58.204103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:58.565322image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:58.934411image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:59.281769image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:52:59.644207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:00.060030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:00.436381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:00.818451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:01.198341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:01.569958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:01.966428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:02.347126image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:02.919400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:03.293488image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:03.694977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:04.070838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:04.462737image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:04.838436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:05.222412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:05.616156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:06.011155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:06.384085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:06.731038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:07.103179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:07.448365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:07.814958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:08.160084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:08.510075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:08.876166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:09.250159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-18T23:53:35.504925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-18T23:53:36.010633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-18T23:53:36.503087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-18T23:53:37.020873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-18T23:53:37.531510image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-18T23:53:10.055354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:11.143455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:12.171164image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T23:53:12.656909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

geneIdgeneSymbolDSIDPIdiseaseIddiseaseNamediseaseTypediseaseClassdiseaseSemanticTypescoreEIYearInitialYearFinalNofPmidsNofSnpssource
01A1BG0.7000.538C0019209HepatomegalyphenotypeC23;C06Finding0.301.0002017.02017.010CTD_human
11A1BG0.7000.538C0036341SchizophreniadiseaseF03Mental or Behavioral Dysfunction0.301.0002015.02015.010CTD_human
22A2M0.5290.769C0002395Alzheimer's DiseasediseaseC10;F03Disease or Syndrome0.500.7691998.02018.030CTD_human
32A2M0.5290.769C0007102Malignant tumor of colondiseaseC06;C04Neoplastic Process0.311.0002004.02019.010CTD_human
42A2M0.5290.769C0009375Colonic NeoplasmsgroupC06;C04Neoplastic Process0.301.0002004.02004.010CTD_human
52A2M0.5290.769C0011265Presenile dementiadiseaseC10;F03Mental or Behavioral Dysfunction0.301.0001998.02004.030CTD_human
62A2M0.5290.769C0011570Mental DepressiondiseaseF01Mental or Behavioral Dysfunction0.301.0001987.02000.020PSYGENET
72A2M0.5290.769C0011581Depressive disorderdiseaseF03Mental or Behavioral Dysfunction0.301.0001987.02000.020PSYGENET
82A2M0.5290.769C0019202Hepatolenticular DegenerationdiseaseC16;C06;C18;C10Disease or Syndrome0.301.0002013.02013.010CTD_human
92A2M0.5290.769C0022660Kidney Failure, AcutediseaseC13;C12Disease or Syndrome0.301.0002013.02013.010CTD_human

Last rows

geneIdgeneSymbolDSIDPIdiseaseIddiseaseNamediseaseTypediseaseClassdiseaseSemanticTypescoreEIYearInitialYearFinalNofPmidsNofSnpssource
84028106481323RNU6-456P0.9310.077C2931456Prostate cancer, familialdiseaseC04;C12Neoplastic Process0.301.02018.02018.010CTD_human
84029106481323RNU6-456P0.9310.077C4722327PROSTATE CANCER, HEREDITARY, 1diseaseC04;C12Neoplastic Process0.301.02018.02018.010CTD_human
84030106783499OPA80.8390.231C4085249OPTIC ATROPHY 8diseaseNaNDisease or Syndrome0.30NaNNaNNaN00GENOMICS_ENGLAND
84031107075310MTCO2P120.3680.962C0268237Cytochrome-c Oxidase DeficiencydiseaseC16;C18Disease or Syndrome; Congenital Abnormality0.331.01999.02011.000GENOMICS_ENGLAND
84032107305681DHS6S11.0000.077C0730294North Carolina macular dystrophydiseaseC16;C11Disease or Syndrome0.501.02016.02016.010CTD_human;ORPHANET
84033109580095HBB-LCR0.7430.115C0002875Cooley's anemiadiseaseC16;C15Disease or Syndrome0.30NaNNaNNaN00CTD_human
84034109580095HBB-LCR0.7430.115C0005283beta ThalassemiadiseaseC16;C15Disease or Syndrome0.30NaNNaNNaN00CTD_human
84035109580095HBB-LCR0.7430.115C0019025Hemoglobin F DiseasediseaseC16;C15Disease or Syndrome0.30NaNNaNNaN00CTD_human
84036109580095HBB-LCR0.7430.115C0085578Thalassemia MinordiseaseC16;C15Disease or Syndrome0.30NaNNaNNaN00CTD_human
84037109580095HBB-LCR0.7430.115C0271979Thalassemia IntermediadiseaseC16;C15Disease or Syndrome0.30NaNNaNNaN00CTD_human